尽管越来越受欢迎,但图形神经网络(GNN)仍然存在多个未解决的问题,包括缺乏嵌入的表现力,向遥远的节点传播信息以及大规模图的培训。了解此类问题的根源并提供解决方案需要开发分析工具和技术。在这项工作中,我们提出了可恢复性的概念,该概念衡量了随机变量中所包含的信息量,以恢复另一种形式。我们提供了一种有效的可恢复性经验估计的方法,证明了它与GNN中的信息聚集的紧密关系,并展示了如何在无监督的图表学习中使用该新概念。我们通过对各种数据集和不同GNN体系结构的广泛实验结果证明,估计的可回收性与聚集方法的表达性和图形稀疏质量相关,可以使用我们的无监督方法来学习GNN表示,并且可恢复性的正则性可缓解准确性下降,从而缓解准确性下降。 GNN深度。重现我们的实验的代码可从https://github.com/anonymons1252022/recoverability获得
translated by 谷歌翻译
权重和激活的量化是减少深神经网络(DNN)训练的计算占地面积的主要方法之一。当前方法使得4位量化的前向阶段。但是,这仅构成了培训过程的三分之一。减少整个训练过程的计算占地面积需要定量神经梯度,即相对于中间神经层的输出的损耗梯度。在这项工作中,我们研究了在量化神经网络训练中具有无偏差值的重要性,以及如何维护它,以及如何。基于此,我们建议一个$ \ texit {logarithic unbiased量化} $(luq)方法,以将前向和向后阶段量化为4位,实现最先进的导致4位训练,没有开销。例如,在Imagenet的Reset50中,我们实现了1.18%的降级。我们进一步改善了这一点以降解仅在高精度微调的单一时期与差异减少方法结合后的单一时期 - 均增加与先前建议的方法相当的开销。最后,我们建议使用低精度格式的方法来避免在训练过程的三分之二期间乘法,从而减少乘法器使用的5倍。
translated by 谷歌翻译
Neural Representations have recently been shown to effectively reconstruct a wide range of signals from 3D meshes and shapes to images and videos. We show that, when adapted correctly, neural representations can be used to directly represent the weights of a pre-trained convolutional neural network, resulting in a Neural Representation for Neural Networks (NeRN). Inspired by coordinate inputs of previous neural representation methods, we assign a coordinate to each convolutional kernel in our network based on its position in the architecture, and optimize a predictor network to map coordinates to their corresponding weights. Similarly to the spatial smoothness of visual scenes, we show that incorporating a smoothness constraint over the original network's weights aids NeRN towards a better reconstruction. In addition, since slight perturbations in pre-trained model weights can result in a considerable accuracy loss, we employ techniques from the field of knowledge distillation to stabilize the learning process. We demonstrate the effectiveness of NeRN in reconstructing widely used architectures on CIFAR-10, CIFAR-100, and ImageNet. Finally, we present two applications using NeRN, demonstrating the capabilities of the learned representations.
translated by 谷歌翻译
A core process in human cognition is analogical mapping: the ability to identify a similar relational structure between different situations. We introduce a novel task, Visual Analogies of Situation Recognition, adapting the classical word-analogy task into the visual domain. Given a triplet of images, the task is to select an image candidate B' that completes the analogy (A to A' is like B to what?). Unlike previous work on visual analogy that focused on simple image transformations, we tackle complex analogies requiring understanding of scenes. We leverage situation recognition annotations and the CLIP model to generate a large set of 500k candidate analogies. Crowdsourced annotations for a sample of the data indicate that humans agree with the dataset label ~80% of the time (chance level 25%). Furthermore, we use human annotations to create a gold-standard dataset of 3,820 validated analogies. Our experiments demonstrate that state-of-the-art models do well when distractors are chosen randomly (~86%), but struggle with carefully chosen distractors (~53%, compared to 90% human accuracy). We hope our dataset will encourage the development of new analogy-making models. Website: https://vasr-dataset.github.io/
translated by 谷歌翻译
Micron-scale robots (ubots) have recently shown great promise for emerging medical applications, and accurate control of ubots is a critical next step to deploying them in real systems. In this work, we develop the idea of a nonlinear mismatch controller to compensate for the mismatch between the disturbed unicycle model of a rolling ubot and trajectory data collected during an experiment. We exploit the differential flatness property of the rolling ubot model to generate a mapping from the desired state trajectory to nominal control actions. Due to model mismatch and parameter estimation error, the nominal control actions will not exactly reproduce the desired state trajectory. We employ a Gaussian Process (GP) to learn the model mismatch as a function of the desired control actions, and correct the nominal control actions using a least-squares optimization. We demonstrate the performance of our online learning algorithm in simulation, where we show that the model mismatch makes some desired states unreachable. Finally, we validate our approach in an experiment and show that the error metrics are reduced by up to 40%.
translated by 谷歌翻译
A master face is a face image that passes face-based identity authentication for a high percentage of the population. These faces can be used to impersonate, with a high probability of success, any user, without having access to any user information. We optimize these faces for 2D and 3D face verification models, by using an evolutionary algorithm in the latent embedding space of the StyleGAN face generator. For 2D face verification, multiple evolutionary strategies are compared, and we propose a novel approach that employs a neural network to direct the search toward promising samples, without adding fitness evaluations. The results we present demonstrate that it is possible to obtain a considerable coverage of the identities in the LFW or RFW datasets with less than 10 master faces, for six leading deep face recognition systems. In 3D, we generate faces using the 2D StyleGAN2 generator and predict a 3D structure using a deep 3D face reconstruction network. When employing two different 3D face recognition systems, we are able to obtain a coverage of 40%-50%. Additionally, we present the generation of paired 2D RGB and 3D master faces, which simultaneously match 2D and 3D models with high impersonation rates.
translated by 谷歌翻译
The use of needles to access sites within organs is fundamental to many interventional medical procedures both for diagnosis and treatment. Safe and accurate navigation of a needle through living tissue to an intra-tissue target is currently often challenging or infeasible due to the presence of anatomical obstacles in the tissue, high levels of uncertainty, and natural tissue motion (e.g., due to breathing). Medical robots capable of automating needle-based procedures in vivo have the potential to overcome these challenges and enable an enhanced level of patient care and safety. In this paper, we show the first medical robot that autonomously navigates a needle inside living tissue around anatomical obstacles to an intra-tissue target. Our system leverages an aiming device and a laser-patterned highly flexible steerable needle, a type of needle capable of maneuvering along curvilinear trajectories to avoid obstacles. The autonomous robot accounts for anatomical obstacles and uncertainty in living tissue/needle interaction with replanning and control and accounts for respiratory motion by defining safe insertion time windows during the breathing cycle. We apply the system to lung biopsy, which is critical in the diagnosis of lung cancer, the leading cause of cancer-related death in the United States. We demonstrate successful performance of our system in multiple in vivo porcine studies and also demonstrate that our approach leveraging autonomous needle steering outperforms a standard manual clinical technique for lung nodule access.
translated by 谷歌翻译
The field of emergent communication aims to understand the characteristics of communication as it emerges from artificial agents solving tasks that require information exchange. Communication with discrete messages is considered a desired characteristic, for both scientific and applied reasons. However, training a multi-agent system with discrete communication is not straightforward, requiring either reinforcement learning algorithms or relaxing the discreteness requirement via a continuous approximation such as the Gumbel-softmax. Both these solutions result in poor performance compared to fully continuous communication. In this work, we propose an alternative approach to achieve discrete communication -- quantization of communicated messages. Using message quantization allows us to train the model end-to-end, achieving superior performance in multiple setups. Moreover, quantization is a natural framework that runs the gamut from continuous to discrete communication. Thus, it sets the ground for a broader view of multi-agent communication in the deep learning era.
translated by 谷歌翻译
在视频分析中,背景模型具有许多应用,例如背景/前景分离,变更检测,异常检测,跟踪等。但是,尽管在静态相机捕获的视频中学习这种模型是一项公认的任务,但在移动相机背景模型(MCBM)的情况下,由于算法和可伸缩性挑战,成功率更加重要。由于相机运动而产生。因此,现有的MCBM在其范围和受支持的摄像头类型的限制中受到限制。这些障碍还阻碍了基于深度学习(DL)的端到端解决方案的这项无监督的任务。此外,现有的MCBM通常会在典型的大型全景图像或以在线方式的域名上建模背景。不幸的是,前者造成了几个问题,包括可扩展性差,而后者则阻止了对摄像机重新审视场景先前看到部分的案例的识别和利用。本文提出了一种称为DEEPMCBM的新方法,该方法消除了上述所有问题并实现最新结果。具体而言,首先,我们确定与一般和DL设置的视频帧联合对齐相关的困难。接下来,我们提出了一种新的联合一致性策略,使我们可以使用具有正则化的空间变压器网,也不是任何形式的专业化(且不差异)的初始化。再加上在不破坏的稳健中央矩(从关节对齐中获得)的自动编码器,这产生了一个无端到端的无端正规化MCBM,该MCBM支持广泛的摄像机运动并优雅地缩放。我们在各种视频上展示了DEEPMCBM的实用程序,包括超出其他方法范围的视频。我们的代码可在https://github.com/bgu-cs-vil/deepmcbm上找到。
translated by 谷歌翻译
本文提出了2022年访问量的挑战的最终结果。 OOV竞赛介绍了一个重要方面,而光学角色识别(OCR)模型通常不会研究,即,在培训时对看不见的场景文本实例的识别。竞赛编制了包含326,385张图像的公共场景文本数据集的集合,其中包含4,864,405个场景文本实例,从而涵盖了广泛的数据分布。形成了一个新的独立验证和测试集,其中包括在训练时出词汇量不超出词汇的场景文本实例。竞争是在两项任务中进行的,分别是端到端和裁剪的文本识别。介绍了基线和不同参与者的结果的详尽分析。有趣的是,在新研究的设置下,当前的最新模型显示出显着的性能差距。我们得出的结论是,在此挑战中提出的OOV数据集将是要探索的重要领域,以开发场景文本模型,以实现更健壮和广义的预测。
translated by 谷歌翻译